A Speech Corpus for Modeling Language Acquisition: CAREGIVER
نویسندگان
چکیده
A multi-lingual speech corpus used for modeling language acquisition called CAREGIVER has been designed and recorded within the framework of the EU funded Acquisition of Communication and Recognition Skills (ACORNS) project. The paper describes the motivation behind the corpus and its design by relying on current knowledge regarding infant language acquisition. Instead of recording infants and children, the voices of their primary and secondary caregivers were captured in both infant-directed and adultdirected speech modes over four languages in a read speech manner. The challenges and methods applied to obtain similar prompts in terms of complexity and semantics across different languages, as well as the normalized recording procedures employed at different locations, is covered. The corpus contains nearly 66000 utterance based audio files spoken over a two-year period by 17 male and 17 female native speakers of Dutch, English, Finnish, and Swedish. An orthographical transcription is available for every utterance. Also, time-aligned word and phone annotations for many of the sub-corpora also exist. The CAREGIVER corpus will be published via ELRA.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملLanguage model acquisition from a text corpus for speech understanding
Speech understanding can be viewed as a problem of translating input natural language of speech recognition results into output semantic language. This paper describes automatic acquisition of a language model for translating natural language into semantic language from a text corpus using a stochastic method. The method estimates co-occurrence probabilities of input and output grammar rules as...
متن کاملEffects of Caregiver Prosody on Child Language Acquisition
This paper investigates the role of prosody in one child’s lexical acquisition using an ecologically valid, high-density, longitudinal corpus. The corpus consists of high fidelity recordings collected from microphones embedded throughout the home of a family with a young child. We analyze data collected continuously from ages 9 – 24 months, including the child’s first productive use of language...
متن کاملVisually Grounded Virtual Accelerometers A Longitudinal Video Investigation of Dyadic Bodily Dynamics around the time of Word Acquisition by
Human movement encodes information about internal states and goals. When these goals involve dyadic interactions, such as in language acquisition, the nature of the movement and proximity become representative, allowing parts of our internal states to manifest. We propose an approach called Visually Grounded Virtual Accelerometers (VGVA), to aid with ecologically-valid video analysis investigat...
متن کاملFirst steps in building a large vocabulary continuous speech recognition system for Vietnamese
This paper presents an overview of our activities for building a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Vietnamese implemented at CLIPS-IMAG Laboratory (France) and International Research Center MICA (Vietnam). Firstly, a new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. Secondly, the first resul...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010